-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rand_distr: Add Zipf distribution #1136
Conversation
cc @kaimast |
I did not compare to the reference implementation in numpy yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be worth comparing this? https://jasoncrease.medium.com/rejection-sampling-the-zipf-distribution-6b359792cffa
On a first look, I'm not sure this method is different, but I would have to do the math to be sure. |
@dhardy The parametrization of the Zipf distribution you linked is different:
Which paramtrization should I implement? |
Sorry, I don't feel qualified to answer that. |
I also found an R library using the The Wikipedia calls |
This follows the naming convention on Wikipedia.
It seems like this is not true [1], at least not with the same exponent. [1] https://en.wikipedia.org/wiki/Zeta_distribution
Also inline the distribution methods.
This should improve performance slightly.
@dhardy I chose to implement both distributions. |
Arguably, the Zipf distribution is related to the Pareto distribution [1]. [1] https://en.wikipedia.org/wiki/Zipf's_law#Related_laws
Hi @vks! Looking at the previous comments you had quite a journey with names (and yeah, it is not very clear). But you got it right! I have looked only at the implementation of Naming I had a look at some references and leave a summary here. Maybe this is useful for the documentation, but I will leave it to you.
References |
Perfect! I understand the reasoning. I would suggest saying something in the documentation about returning floats for integer random variables, but that deserves its own discussion and PR. This would involve investigating what is the effective domain of our implementations... information which is valuable for the user, but hard to get (or depends on parameters). |
There was some discussion in #987 and #1093, which resulted in some documentation in the Rand book. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the comments about Zipf.
I have some disagreements with some of the computations, I tried to explain the reasoning in each case. Feel free to ask for more clarification.
Notably, I also propose to introduce one more error variant NTooBig
.
Another case we might want to consider: For large |
Because this PR is already quite large, I would like to leave the extended distribution tests for a follow-up PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhardy we are finished with the review and agree on adding distributional tests in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. On that basis I approve merging. And thanks for stepping in to review.
@saona-raimundo Thanks for the extensive review! @dhardy Should I squash before merging? |
@vks can't say I care too much whether or not it gets squashed. The reason I didn't merge myself is because I wasn't quite sure whether you were ready to. |
Fixes #1069.